HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1
نویسندگان
چکیده
PREMISE OF THE STUDY Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). METHODS AND RESULTS HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. CONCLUSIONS HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper.
منابع مشابه
Target Capture and Massive Sequencing of Genes Transcribed in Mytilus galloprovincialis
Next generation sequencing (NGS) allows fast and massive production of both genome and transcriptome sequence datasets. As the genome of the Mediterranean mussel Mytilus galloprovincialis is not available at present, we have explored the possibility of reducing the whole genome sequencing efforts by using capture probes coupled with PCR amplification and high-throughput 454-sequencing to enrich...
متن کاملPlastome Rearrangements in the “Adenocalymma-Neojobertia” Clade (Bignonieae, Bignoniaceae) and Its Phylogenetic Implications
The chloroplast is one of the most important organelles of plants. This organelle has a circular DNA with approximately 130 genes. The use of plastid genomic data in phylogenetic and evolutionary studies became possible with high-throughput sequencing methods, which allowed us to rapidly obtain complete genomes at a reasonable cost. Here, we use high-throughput sequencing to study the "Adenocal...
متن کاملRAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and t...
متن کاملOp-molb140061 1077..1088
Studies of microbial evolutionary dynamics are being transformed by the availability of affordable high-throughput sequencing technologies, which allow whole-genome sequencing of hundreds of related taxa in a single study. Reconstructing a phylogenetic tree of these taxa is generally a crucial step in any evolutionary analysis. Instead of constructing genome assemblies for all taxa, annotating ...
متن کاملSupersplat—spliced RNA-seq alignment
MOTIVATION High-throughput sequencing technologies have recently made deep interrogation of expressed transcript sequences practical, both economically and temporally. Identification of intron/exon boundaries is an essential part of genome annotation, yet remains a challenge. Here, we present supersplat, a method for unbiased splice-junction discovery through empirical RNA-seq data. RESULTS U...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 4 شماره
صفحات -
تاریخ انتشار 2016